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Abstract 

In this paper, we investigate random walk based token circulation in 
dynamic environments subject to failures. We describe hypotheses on the 
dynamic environment that allow random walks to meet the important 
property that the token visits any node infinitely often. The random- 
ness of this scheme allows it to work on any topology, and require no 
adaptation after a topological change, which is a desirable property for 
applications to dynamic systems. For random walks to be a traversal 
scheme and to answer the concurrence problem, one needs to guarantee 
that exactly one token circulates in the system. In the presence of tran- 
sient failures, configurations with multiple tokens or with no token can 
occur. The meeting property of random walks solves the cases with mul- 
tiple tokens. The reloading wave mechanism we propose, together with 
timeouts, allows to detect and solve cases with no token. This traversal 
scheme is self-stabilizing, and universal, meaning that it needs no assump- 
tion on the system topology. We describe conditions on the dynamicity 
(with a local detection criterion) under which the algorithm is tolerant to 
dynamic reconfigurations. We conclude by a study on the time between 
two visits of the token to a node, which we use to tune the parameters of 
the reloading wave mechanism according to some system characteristics. 

1 Introduction 

Concurrence control is one of the most important requirements in distributed 

systems and have been investigated for 40 years. The emergence of peer-to-peer 
networks, of wireless mobile networks has renewed the context of the design 
of protocols used in distributed applications. These networks require a new 
modeling and new sohitions to take into account their intrinsic dynamicity. 

In this paper, we focus on token circulation based solutions: the concurrent 
access to the shared resource is managed by a "token" message that circulates in 
the distributed system. We present a self-stabilizing universal traversal scheme 
based on a random walk, with a particular focus on dynamic systems. 

In distributed computing, a random walk is implemented by a Token message 
that is sent from node to node in a random fashion: each time a node receives 



a Token message, it executes a code that only the token owner is allowed to 
execute, and then forwards the token to one of its neighbor chosen at random. 

Properties of random walks allow to design a traversal scheme using only 
local information [AKL+TO] : such a scheme is not designed for one particular 
topology and need no adaptation to fit other ones. Moreover, random walks offer 
the interesting property to adapt to the insertion or deletion of nodes or links in 
the network without modifying any of the functioning rules. With the increas- 
ing dynamicity of networks, these features are becoming crucial: redesigning a 
new browsing scheme at each modification of the topology is impossible, and 
flooding-based solutions can lead to the congestion of the network. 

An important result of this paradigm is that the token will eventually visit 
(with probability 1) all the nodes of a system, even if it is impossible to capture 
an upper bound on the time required to visit all the nodes of the system. 

Random walks based traversal schemes have be used in many theoretical 
distributed computing problems: mutual exclusion |IJ90| . spanning tree con- 
struction |BIZ89| . or at applicative level: decentralized recommender system 
|KLMT10] and concurrence management in Grid computing |CiulO| . 

The random walk traversal scheme can be affected by different errors. In this 
paper, we manage them in a self- stabilizing fashion, as introduced by Dijkstra in 
|Dij74| . After a fault, a self-stabilizing system is led to an arbitrary configuration 
but eventually recovers a normal behavior and then satisfies the specification of 
the problem. 

Related works 

The token circulation can be affected by only 2 errors: 

• the absence of tokens; 

• the presence of more than one token. 

Both faults are violations of global properties of the system. However, the 
second fault may entail (and in our algorthim, will eventually entail) the local 
property that a node holds several tokens at once. Then this node can remove 
all of them but one, which leads, when all duplicate tokens are removed, to a 
correct configuration. The first fault has no locally checkable certificate, so that 
a global mechanism (meaning a mechanism involving all nodes) has to be put in 
place. This fault is of communication deadlock nature: all nodes are waiting for 
messages and there are no messages on the communication links. The solution 
proposed by |GM91j is to use timeout: when a node has not seen the token for 
a long time, it creates a new one. In ^VarOOl , the author proposes a message 
passing adaptation of Dijkstra algorithm ^Dij74j . In particular, a self-stabilizing 
token circulation algorithm on an undirected ring is presented. Communication 
deadlock is solved by a timeout process in a distinguished node called the root. 
Nevertheless, duplication of tokens may occur. 

To solve this problem, the author introduces the counter flushing paradigm 
and designs a self-stabilizing token circulation algorithm. The idea of counter 
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flushing is used in numerous papers dealing with self-stabihzation in message 
passing model, as in |CW05l IHVOl] . This idea is based on a bound on the time 
between two successive receptions of the token, which we cannot have with a 
random walk. Starting from a configuration in which there is a single token, 
eventually a token is created unnecessarily, which violates the specification. 

In |DSW06| ■ the authors use random circulating tokens (they call agents) 
to broadcast information in communication groups. To cope with the situation 
where no agent exists in the system, the authors use a timer based on the 
cover time of an agent {k x n^). They precise as a concluding remark "The 
requirements will hold with higher probability if we enlarge the parameter k for 
ensuring the cover time[. . • ] In our case the obtention of a single token is a 
strong requirement, and the use of a parameter k which increases the probability 
to reach a legitimate configuration cannot be used. 

Works have been led on the random walk token circulation paradigm (see 
|Cooll| ). in particular to reduce the average time between two successive visits 
by the token or to attain a given stationary distribution of the token locations 
f |lKOY02llNOSY10j ). 

Contribution 

Our random walk based solution is self-stabilizing: it tolerates transient failures. 
If there is no token in the system, upon timeout the missing token is recreated. 
Our solution is decentralized (no distinguished node) and only the expected time 
for a random walk based token to cover the system can be captured. Each node 
is candidate to regenerate the token, and even the choice of an arbitrary timeout 
period implies that the system could never stabilize: an infinite production of 
unnecessary tokens can occur. 

No node can ascertain that the token does not exist, due to the way the 
token moves. However, the longer a node has not seen the token, the more 
probable it is that no token exists. Thus, upon a timeout, a node should create 
a new token. To avoid this creation in cases when a token already exists, tokens 
periodically inform nodes of their existence, which inhibits tokens creation. We 
call this process a reloading wave. Each node that has been previously visited 
by this token receives the reloading wave, resets its timer and is thus forbidden 
to create a duplicated token for the next period. The only case when a node 
creates a new token after the timeout period, corresponds to a situation in which 
the node has never been visited by the token. 

The reloading wave information should be broadcast efficiently and reliably 
through the network. The reloading wave is defined in connection with the 
token. We only use the information collected and stored by the token through 
its traversal. Thus there is no additional protocol. In such a token, called a 
circulating word, a dynamic self-stabilizing tree is maintained, through which 
the information is broadcast. 

The reloading wave propagation is periodical. The tree used to broadcast the 
wave is adaptive: it evolves with the moves of the token. Thus, two propagations 
of the reloading wave will likely use different propagation trees. 
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In section 2, wc present our model of distributed system and some prelimi- 
nary notions about random walks and self-stabilizing systems. In section 3, we 
propose a token circulation scheme in a dynamic environment. We prove that 
this scheme guarantees the specification of token circulation as long as topol- 
ogy changes are independent of the token moves. In section 4 we introduce the 
reloading wave mechanism to design a self-stabilizing version of the previous 
algorithm. This new mechanism is proved to work in a static environment. The 
case of a dynamic environment is discussed in section 5. A criterion on the 
mobility pattern to make the algorithm robust against topological reconfigura- 
tions is determined. In the last section, we propose to optimize a parameter 
(timeout) of our algorithm to accelerate the convergence of our algorithm to a 
configuration where the specification of the problem is satisfied. 

2 Model and Preliminaries 

2.1 Distributed systems 

We consider a distributed system as an undirected connected graph G — {V, E), 
where is a set of nodes with = n and E is the set of bidirectional com- 
munication links with \E\ = m. A node is composed of a computing unit and 
a message queue. A communication link exists if and only if i and j are 
neighbors. Every node i maintains a set of its neighbors ids (denoted by Ni). 
The degree of i is the number of neighbors of i, i.e. lA^j] (denoted by deg(j)). 
We consider a distributed system in which all nodes have distinct identities. We 
assume an upper bound Af on the number of nodes in the network, an upper 
bound on the delay to deliver a message and an upper bound on the process- 
ing time for each node. The sum of these two bounds corresponds to the time 
for receiving and treating a message. In the sequel, we take this as time unit. 
Moreover, we assume reliable channels during and after the stabilization phase. 

2.2 Model 

A configuration of the system is an instance of the nodes states and a multi-set 
of messages in transit in the links. Token^ is the set of all token messages 
in transit in the network at the configuration 7, and Tokenj{i) is the set of 
the Token messages heading toward node i at configuration 7. A computation 
e of the system is a sequence of configurations 71 , 72 , . . . , 7^ , . . . such that the 
configuration 7^+1 is reached from denoted by (7^ — >■ 7/c+i) by a single step, 
a step being an atomic process of one message in the system. A configuration 
S is said reachable from 7 and denoted by 7 S if there exists a sequence 
such that 7 = 7o — >■ 71 — >■ . . .7;c-i — >■ 7fe = <5. Let C be the set of possible 
configurations of the system and £ be the set of all possible computations of the 
system. The set of computations starting with the configuration 7 is denoted by 
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f-y. The set of computations of £ whose initial configurations arc all elements 
of ^ C C is denoted by £a — \J-y^A ^i- 

The only nodes variables in our algorithm is a timeout. 

Remark 1 Since the algorithm we design is random, it would be more accurate 
to describe a computation as a random process £-f{uj) — (71(0;), 72(1^;), . . .), with 
7i : OJ € ri — 7i(w) random variables. Then, the random choice of a neighbor to 
which the token is sent would make a random walk of the sequence of vertices to 
which a given token is sent, which is enough to establish the properties required 
to prove the algorithm. Thus, to avoid overly unwieldy notations, we skip the 
uj in the sequel and explicitly use the relevant properties of random walks when 
required. 

2.3 Failures and self-stabilization 

A transient fault is a fault that causes the state of a process (its local state, 
program counter, and variables) and of a channel (arbitrary messages may be re- 
moved and added) to change arbitrarily without further affecting the behavior of 
the algorithm. An algorithm is called self-stabilizing if it is resilient to transient 
failures in the sense that, when initiated in an arbitrary system configuration, 
and no other transient faults occur, the algorithm converges to a legitimate con- 
figuration after which it performs its task correctly (see |Dij74[ [DolOO| ). Thus, 
a self-stabilizing system experiencing any transient failure, eventually recovers 
its normal behavior. 

As we work with random walks, we cannot ascertain the time at which 
a property will be true, but we can know with high probability that it will 
be. "With high probability" (in the sequel, "whp") means that the probability 
that this event never occurs is zero, in the sense that nothing forbids that this 
event does not occur (one can find an infinite execution without the occurrence 
of this event), but as times goes by, it is less and less likely that the event 
has not occurred. Thus, most of the properties we will prove are whp, and 
the convergence times will be expected times (no deterministic bound can be 
provided) . 

C being the set of all configurations of the system, an algorithm is self- 
stabilizing if there is a set of legitimate configurations CC such as: 

1. the system eventually reaches a legitimate configuration [convergence prop- 
erty); 

2. starting from any legitimate configuration, the system remains in CC {clo- 
sure property); 

3. starting from any legitimate configuration, the execution of the algorithm 
verifies the specifications of the problem [correctness property). 

More formally, in this paper we use the notion of attractor to define the 
self-stabilization concept. 



5 



Definition 1 (Attractor) Let B C C and A C B, A is an attractor of B if 
and only if: 

• convergence V(7i, 72, . . .) € £5, 3i > 1, 7; e A 

• closure V(7i, 72, . . .) € f , 71 G A Vi, 7^ G A 

Definition 2 (Probabilistic attractor) Let B C C and A C B, A is a prob- 
abilistic attractor of B if and only if: 

• convergence V(7i, 72, . . .) G £_b, 3i > 1, 7^ G A whp. 

• closure V(7i, 72, ...)€£, 71 G A =^ Vi, 7^ G A. 

This means that starting from any configuration in B, the system evcntuaUy 
reaches a configuration in A whp: an execution can be built in which the system 
never reaches A, but such an execution requires a sequence of decisions that are 
less and less likely as time goes by. For instance, it can be imagined that one 
never wins at head or tail, but the longer one plays, the less probable it is. Once 
the system has reached a configuration in A, it (deterministically) remains in 
A. 

Definition 3 (Specification) A specification is a predicate on a computation. 

Definition 4 (Self-stabilization) A system is self-stabilizing if and only if 
there exists a non-empty set CC C C such that 

• CC is an attractor for C . 

• Every e in CC meets the problem specification. 

Definition 5 (Probabilistic self-stabilization) A system is probabilistically 
self- stabilizing if and only if there exists a non-empty set CC C C such that 

• CC is a probabilistic attractor for C . 

• Every e in CC meets the problem specification. 

Thus, a probabilistically self-stabilizing algorithm is such that the longer 
one waits, the less likely the algorithm docs not meet the specification. This 
probability can be bounded by a quantity that tends to 0. 

2.4 Random walks properties 

A random walk is a sequence of vertices visited by a token that starts at i 
and visits other vertices according to the following transition rule: if the token 
is owned by i at time t then at time t + 1, it will be owned by one of its 
neighbors, this neighbor being chosen uniformly at random among all of them 
[L^93 |AKLl79 . 
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Algorithm 1 Random walk circulation algorithm on site i 
RO: Upon reception of a message (Token) 
Choose i uniformly at random in Ni 
Send Token to i 



To compute the complexity of a random walk based distributed algorithm, 
we use three main quantities: 

• The hitting time is the average time to reach a node j starting from a 
node i, and is denoted by hij. It is defined as the conditional expectation 
of the random number of transitions before entering j for the first time 
knowing that the token starts from i. It has been proven in [Lov93j that 
h,j is bounded by ^n^. In |IKOY02[ iNOSYlOj . authors provide a local 
mechanism to reduce this value to n^. 

• The cover time is the expected time for a random walk starting at i to 
visit all the nodes of the system and is denoted by Ci. So, the cover time 
of a graph is C = max{Ci/i E V} and it was proven in |Fei95b[ IFei95a) 
that, depending on the topology of G, 0(71 Inn) < C < 0{n^). 

• Finally, the meeting time is the expected time for several random walks 
to meet on an arbitrary node and is denoted by M . The meeting time is 
bounded by 0{n^) |TW91j . 

2.5 Problem Specifications 

The specification we are willing to meet is the following, to ensure a consistent 
token circulation: 

• at each step, exactly one Token message circulates in the system; 

• any node will receive the token message infinitely often whp. 

Since we suppose the treatment of messages is atomic, and take the successive 
configurations of the system when the considered node has finished with its local 
treatment, we need not consider the case when the token is being treated. 

We note Token-y the set of token messages at configuration 7, and Token^ii) 
the set of token messages heading to a node i 

Definition 6 (Problem Specification) We say a computations — (71,72, ■• •) 
satisfies specification ProbTokCirc of Probabilistic Token Circulation if: 

• yk, \Token-yi^ \ — 1 (there exists exactly one token in the system); 

• \fk,\fi,3l > fc, \Token^^{i)\ = 1 whp (any node will receive the token in- 
finitely often). 

Our contribution is to design a solution that eventually satisfy these speci- 
fications in a dynamic and faulty environment. 
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3 Dealing with topology changes 



We show in this section that a random walk on a dynamic graph has the same 
properties that a random walk on a static graph. A random walk is well adapted 
to dynamically evolving graphs. Indeed, its traversal is based only on local 
information, and it has not to be redesigned after a topological change. We 
prove in this section that if the node mobility is independent from the token 
moves (in particular, if no daemon picking the random moves of the token is 
behaving as an adversary), desirable properties hold: 

• any node is visited in finite time; 

• we can compute the average time it takes to hit a given node, or to visit 
all nodes. 

Consider a dynamic graph on a static set of nodes, with dynamic edges 
Gt = (V, Et), with t a continuous time index. We model the disconnection of a 
node by all its link being removed. We suppose in the sequel that: 



the evolution of the graph is an homogeneous Markov process (ze the 
evolution of the system topology only depends on its current state); 

it is independent from the choices of the random walk (this avoids cases 
with the system behaving as an opponent to the walk). 



The homogeneity assumption means that the token evolves much faster than 
the system. Clearly, in most concrete applications, the system evolution is driven 
by some daily cycle. If the evolution is weak at a time scale of below one minute, 
and that the hitting time is itself below one minute, then, this assumption is 
realistic in the following computations. 

Thus, if the system is considered at each reception of the token, the evolution 
of the graph is discretized. In the sequel, we consider the discretization G-y, 
which is a Markov chain by independence of the token movements and of the 
graph evolution. 

Given the graphs G and G", we note pa^c the probability that at a step 
the dynamic graph is G and at the next step, it is G'. A step corresponds to a 



time unit (cf. Section 2.1). If the graph evolves as a markovian process, then 
this discretization is a Markov chain. 

We note pij (G) the probability that, in G, node i sends the token to j. hij (G) 
is the average time it takes to the walk, starting on i, to reach j, knowing that at 
the beginning of the walk, the system is in the state described by G. Finally, the 
system being described as an homogeneous Markov chain with a non-bipartite 
finite state space, it has a stationary distribution we note tt. The state space 
is non-bipartite, since the opposite would mean that edges blink at the exact 
same pace as the token moves. A stationary distribution means that if we look 
at the system at a certain time, then with probability 7r(G), its topology is G. 

Note Pij — J2g '^i^)Piji^) the average probability that the token being on 
i, it is sent to j. 
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Theorem 1 The hitting time of a random walk on a dynamic graph is such 
that hij = 1 + J2kPik^kj, and hu = 0. 

Proof We state that: 

k \ G' J 

This means that the token being in i and the system being described by G, 
with probabiUty pik{G), the token is sent to k. When sent to k, it takes one step 
and then, the token has to go from k to j, the system topology having evolved 
to G' (with probability pg^G') in the meantime. The hitting time from i is one 
step to send the token to one neighbor, plus the expectation over the chosen 
neighbor of the average time it takes to the token to go from it to j. This hitting 
time from k to j is the average hitting time over the possible system states G' . 

We are interested in the hitting time from node i to node j. If we have no 
information on the system state at the beginning of the process, we take the 
average hitting time over all possible system states: hij — J^g T^{G)hij{G). 

hij = ■n{G)hij{G) by definition 

G 

— E ""(G) "^^PikiG) I 1 + '^^PG^G'hkj{G') I according to the previous equation 

G k \ G' ) 

= J2J2 <G)pMG) + J2 <G)J2p,kiG) Y.PG^G'huj{G') 

G k G k G' 

= E E ^iG)P^k{G) + E E E ^iG)pG^G'P^k{G)hk,{G') 
k G k G' G 

= ^Pik + ^^^'^{G)pG^G'Pik{G)hkj{G') by definition of 

fc k G' G 

= E ( + E E ^(^)pG^G'P.fc(G)/ifc, (G') ) 

fc \ G' G / 

= E ( + E '"'1 (^') E ^iG)PG^G'P^k (G) J 
fc \ G' G / 

= E (^+E'^'=^(G') (E'^(g)pg^g' ) (e^( 

fcV G' \G / \ G 



^{G)p,k{G) 



The last equality comes from the independence of the system evolution and 
the token moves. Indeed, '^QT^{G)pG^G'Pik{G) is the expectation over the 
system states of the probability that the system evolves to a given state G' 
times the probability that the token moves to k. Since these quantity are inde- 
pendent, the expectations of their product is the product of their expectations, 

(EG^(G)PG^G')(EG^(G)p.fc(G)). 
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hij 



Thus, since J^g '^{C')pg^G' = ""(G') (by definition of a stationary distribu- 
tion) and Y,G'^('~')Piki'^) =Pik (by definition): 

^ (wk+Y.^kAG') {^^{G)pG^G^ (^5]7r(G)K,.(G)^^ 

= 51 [pik + ^hkj (G')7r(G')M I by definition of 
k \ G' J 

= J2lHk(^l + J2h,,iG')n{G')^ 
= '^Piki^ + hkj) 

k 

Now, pij is a transition probabihty: 

= EE'^(G)p.,(G) 
= E'^(g)Ep'^-(g) 

G J 

= ^7r(G) = l 

G 

Finally, — l + X]fcPifc^feji ~ 0; which is the very equation followed 

by the hitting time of a random walk on a weighted graph G = (E, E x E,uj), 
with uj{i,j) = Pij. □ 

Corollary 1 Random walks on dynamic graphs verify the hitting, cover, and 
(if the graph is not bipartite) meeting properties. 

Proof The computation of the hitting time at theorem implies that it is finite. 
Thus the hitting property is verified for any node. The cover property follows 
from the hitting property. □ 

Corollary 2 Algorithms in \BS01\ that compute hitting and cover times apply. 

However, uj(i,j) and uj{j,i) can be different, which would make the graph 
directed. Classical bounds on hitting times and cover times may not apply. 



4 A self-stabilizing token maintenance mecha- 
nism 

4.1 Principles 

In this section, we focus on the token maintenance mechanism. For the sake 
of clarity, we assume in this section that the number of nodes in the system is 
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exactly n. We will discuss in the next section how to relax this assumption. 

We consider a token that circulates through the system using a random walk 
scheme, thus by corollary [l] all nodes are visited infinitely often (satisfying the 
first part of the specification, cf. Definition |6| . The system can be erroneously 
initiated: configuration with no token, or with several tokens can occur. To 
solve the absence of token, we introduce a content in the token. As the designed 
solution is self-stabilizing, we have to deal with arbitrary initiated token content. 

4.1.1 Dealing vifith the absence of token 

The lost token situation is solved by a decentralized timeout procedure: each 
processor indistinctly has the possibility of producing a new token. Each node 
maintains a timer. Each timer is set at a value time units. (The way to tune 
the values of Tm will be discussed in Section [6]) . Each node measures the time 
since the last token visit. If this time is greater than T^, then a new token is 
created. No upper bound is available on the time the token returns to a node «, 
which makes it impossible to use solutions like the one in [VarOO] , consisting in 
setting a timeout on each node, at the expiration of which, if no token has been 
received, a new one is created. The following impossibility result proves this. 

Proposition 1 (Impossibility result) Whatever the timers values of each 
node in the system, the closure property is not satisfied whp. 

Proof Let i be a node in G, with (at least) two neighbors j and k (in a connected 
graph with more than two nodes, such a node exists). Consider a legitimate con- 
figuration with the token on i and a timer T on k. Note d — max{deg(j), deg(«)}. 
Considering the case when the token is on i, with probability greater than 
it goes to j. Then, with probability greater than it goes to i. Then, with 
probability greater than ^ > 0, the token does not hit k for T steps, leading 
to its timeout being triggered, and an unnecessary token creation. Thus, whp, 
the system spontaneously leaves a legitimate configuration. □ 
To avoid unnecessary token creation, we propose a solution with the following 
mechanisms: 

1. A local mechanism for monitoring the last visit time of the token to a 
node i. 

2. A mechanism for detecting that some timers are about to expire. This 
mechanism is maintained by the node which is the current token holder. 

3. A mechanism maintaining a spanning tree, which is rooted at the current 
token holder. 

4. A distributed mechanism that propagates messages on this tree in order 
to reset the timers. 

The first two items correspond to the decentralized timeout procedure. The 
last two items correspond to the reloading wave. 
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(3) Reloading wave definition A reloading wave is defined regarding a 
token identity. 

When a node i receives reloading wave message, i is notified that a token is 
still circulating in the system, and it resets its timer. Thus, the reloading wave 
prevents node i from creating a copy of token t. 

The reloading wave is broadcast through an adaptive (spanning) tree. There 
is no additional protocol, since we use token t content. The token collects and 
stores the identities of each node during its random walk traversal. This content 
is based on the history of the token's moves. Such a token is called a circulating 
word. Since the token is circulating continuously through the network, the 
induced tree is perpetually updated taking into account the possible network 
topology changes. 

A token t contains the following data structures: 

• A counter, t.hop that represents the number of edges visited during the 

traversal. 

• An array, t.table. Each time the token moves from a node j to a node i, 
the token sets t.table[j] = i, and t.table[i] = i. 

Each time a node i receives the token t, a tree rooted on i can be locally 

computed by i using the topological information stored in the token. The 
tree induced by t.table is {V,Et) where Et = {{k,t.table{k)), k G V&nd k ^ 
t.table{k)}. 

Example 1 From the following sequence of the token's moves < 1,3,5,4,3 > 
the token is at node 3 and t.table is (1. represents the value "undefined''^.- 
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The forest induced by t.table is {V,Et) where V = {1,3,4,5} U {2} and 
£;t = {(1,3),(4,3),(5,4)}. 

If the next token moves are < 2,1,2,3,1 > the token table is updated to 

and the tree induced by t.table is now {V = {1,2,3,4,5} 

and ^T = {(3,1), (2, 3), (4, 3), (5, 4)}}. 

(4) Tlie reloading wave mechanism The reloading wave is broadcast un- 
der the following conditions: the token maintains the counter t.hop which is 
incremented at each hop. The counter is set to at token creation. This value 
is compared to the timeout value minus the time to achieve a wave propaga- 
tion. When this counter value is superior or equal to the latter value, the node 
that holds the token launches the wave and the token counter t.hop is reset to 
0. When a node receives the wave from the token, it reloads its timer to T„. 

4.1.2 Configurations with multiple tokens 

To design a self-stabilizing solution, starting from any initial configuration, the 
system must converge to a correct behavior: exactly one random walk based 
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token circulates through the system. The previous section deals with the way 
to produce at least a token when a communication deadlock occurs. Faulty 
configurations with several tokens are possible (due to duplication for instance). 

Various articles |IJ90[ ITW91| have dealt with the multiple token situation 
in case of a random walk scheme. The authors propose to use the meeting 
property (cf. Corollary [T]) of random walks to reduce the number of token to 1: 
each time a node receives several tokens, it discards all of them but one. Thus, 
in finite time, a single token remains in the network. 

Two strategies are possibles: 

• remove all tokens but one; 

• merge the content of all tokens in a new one. 

We propose to merge all the topological information before discarding any 
token. This strategy entails more computation, but accelerates the construction 
of a spanning tree inside a token. Once all the sub-trees contained in the different 
tokens have been merged (cf. Procedure [T]), the resulting sub-tree is stored in 
the remaining token (the one that is not discarded, cf. Rule Rl.b Algorithmic]). 



Procedure 1 Procedure: merge_tokens(tl: token, t2: token) on node i 
for fc = to TV do 

if {tl.table[k] = _L) A {t2.table[k] ^ _L) then 

tl.table[k] i — t2.table[k] 
end if 
end for 

tl.hop i — Tiiayi(tl.hop,t2.hop) 
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token tl 

















token t2 



Resulting token 



Figure 1: Example of two merged tokens 



4.1.3 Configuration with arbitrary token content 

Transient failures can produce erroneous token content: a node j may be regis- 
tered as father of node i in the reloading wave tree while they are not neighbors. 
Then, when i is hit by the token, its father is set to itself, and the error is cor- 
rected. 

4.2 The algorithm 

The algorithm is written according 4 events on a node: 

• Node i receives one or several tokens (Rl). Each token is updated and its 
consistency is checked (Rl.a). In case of multiple tokens, they arc merged 
into one (Rl.b). If the condition to launch the reloading wave is satisfied, 
the node begins the propagation of the reloading wave (Rl.c). Finally 
the token is forwarded to a neighbor chosen at random (Rl.d) and the 
node resets its timer (Rl.e). 

• The timer of node i expires (R2). A new empty token is created (R2.a) 
and forwarded to a neighbor chosen at random (R2.b). The node resets 
its timer (R2.c). 

• Node i receives a reloading wave message (R3). The node continues the 
reloading wave propagation (RS.a) and resets its timer (R3.b). 

• Node i's clock ticks (R4). The node decrease its timer. 
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This algorithm has four rules Rl to R4, that arc split into sections. All 
sections in a rule are executed in sequence, and the rule is executed atomically. 
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Algorithm 2 Algorithm on site i 



Rl: Upon reception of a set T of Token messages 

a: Tokens update 
for alH e T do 

t.table[i] < — i 



t.hop i — t.hop + 1 
end for 

b: Tokens merge 
choose t in T 
T i — T\{t} 
if 

while T 7^ do 

choose t2 in T 

merge_tokens(t', ^2) 

T ^ T\{t2} 
end while 

c; Possible Reloading Wave propagation 
if t'.hop > Trn - (n + 1) then 

for all j such that t' .table[j] = i A j G Ni do 
send reload, t'. table to j 

end for 

t'.hop i — 
end if 

d: Token circulation 

Send f to j chosen randomly in Ni 

e: Node update 

timer i — 

R2: Upon a release of timer 

a: Token Creation 
for j = to M do 
t'.table[j] i — _L 
end for 
t' .table[i] i — i 
t'.hop < — 
b: Token Circulation 
Send t' to j chosen randomly in Ni 
c: Node update 
timer < — T^ 

R3: Upon a reception of message {reload, table) 
a: Reloading Wave propagation 

table[i] i — ± {to ensure that the reloading wave terminates} 
for all j such that table[j] = i A j G Ni do 

send reload, table to j 
end for 

b: Node update 16 
timer < — 




R4: Upon a clock tick 

timer < — timer — 1 



4.3 Proofs 



Wc present in this section the correctness proofs of the algorithms. Wc show that 
our algorithm is self-stabilizing and achieve a token circulation: the execution 
of our algorithm starting in an arbitrary configuration will reach a legitimate 
configuration (the set CC of configurations). 

4.3.1 Preliminaries 

A configuration 7 is characterized by: 

• the graph — (V^, E^); in this section, no topological change is assumed, 
so that this graph is constant G-y = G = {V, E); 

• the value of variables: 

— the value of all timers timer i{'y) for all i gV^ 

• the multi-set of messages, composed of: 

— Token^ the multiset of token messages, in E x V^' x [0;T„i]: t = 
{{i, j), tablet, hopf) g Token-y means that there is (at least) one to- 
ken t sent from i and pending reception by j with table tablet and hop 
counter hopt; we note Token^{i) = {{{j, i), tablet, hopt) G Token^} 

the set of all token messages pending reception by i; for t = {{j, i), table, hop) G 
Token^, we note t.emitter = j, t.recipient = i, t.table = table and 
t.hop — hop; 

— Wave-y the multiset of reloading wave messages, m. E x V^: w = 

{{i, j),tablew) G Wave-y means that there is (at least) one reloading 
wave message w sent from i and pending reception by j with table 
tablew 

We consider that the execution of an algorithm is atomic. 

First, we define what we call a token, and then we prove that the reloading 
wave has the intended effect: no token can be created by a node that has already 
received a token. Finally, we prove that the algorithm provides a self-stabilizing 

traversal scheme. 

Consider two configurations 7 I- 7' (the execution being supposed atomic, 
such a step involves that a message has been received and treated to reach 7' 
from 7). If 7' is the result of the application of Rk (1 < A; < 4) by node i 
we note 7 h^''^*' 7'. The execution of all algorithms being supposed atomic, if 
7 h 7', we have the following possibilities: 

1. 7 |-^^(*) 7': then, 7 is such that 3{{k,i),t) e T C Token^, and 7' is 
obtained from 7 by: 

(a) t'.hop = maK{t.hop/t eT} + l mod (T„ -{n+ 1)); 

(b) Vfc ^ i,j, {3t G T,t'.table[k] = t.table[k] ^ -L) V (Vt e T,t.table[k] = 
_L); t'.table\i] = t'.table\j] = i; 
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(c) ift'.hop = 0,WaveY ^ Wave^U {{{i, j),t' .table) /t' .table[j] =ihie 
Ni}; 

(d) Token^' = Token^\T U {((i, j), t')}, with j e Ni; 

(e) timer'l' ^ = T^; 

2. 7 |-^^(*^ 7': then, 7 is such that timer'l''^ = 0, and 7' is obtained from 7 
by: 

(a) Vj 7^ i,t' .table[j] = ±; <'.<a6Ze[i] = i; 

(b) Token^' — Token^ U {{{h j),t')} with j e 

(c) timer'l' = T^; 

3. 7 h^^'*^ 7': then, 7 is such that there is {{k,i),w) G Wave^, and 7' is 
obtained from 7 by: 

(a) Waveji — Wave^\{{{k,i),w)} U {{{i, j),w')/'w.table[j] — iAj € 
Ni,yk, w' .table[k] = w.table[k],w' .table[i] = _L}; 

(b) timer'l' '' = T^; 

4. 7 h^**^*^ 7': then, 7 is such that there is timer'[''^ > 0, and 7' is obtained 
from 7 by: 

(a) timer'f = timer'f^ — 1. 

In item[T] T represents the set of tokens that are received by i. T contains at 
least one token, but may contain several of them, in which case they are merged 



into one token noted t' in the sequeL la is the update of the hop counter: the 



hop counter is decreased by one, and if it reaches 0, a wave is propagated (Ic 



and the hop counter reset (hence the mod T„i — {n + 1)). Node i resets its 



timer ( le). lb is the computation of the new table: i is the root, and the father 
of the sender, the remaining of the tree is obtained by picking for each node of 
the tree its father in one of the received trees. 



At the timer expiration on node i, it sends a newly created token. 2a is the 
creation of a tree consisting of the single node i. At [2c] the timeout is reset. [2b] 
states that, at some edge neighboring i, the new token is added. 

In|3a[ node i receives a Wave message w and sends Wave messages w to all 



its children as indicated in w.table. It resets its timer (3b). 
At each clock tick, node i decrements its timer Q. 

Between two successive applications of R4 by a given node, all nodes that 
can apply R3, Rl and R2 apply them. In rules Rl and R2, node j is chosen 
at random. 

Definition 7 (Token and state of a token) From Rl, we say that any to- 
ken in T has become t' . For t in T, we will note t^''^ t^'' \ 
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Definition 8 A node i is said to receive a token at step 7 — >■ 7' if there exists 
a token t in an edge to i at configuration 7, with t ^ t' , and t' is in an edge 
from i. 

All tokens follow a random walk. In particular, the hitting and cover prop- 
erties are verified, so that, for any node i and any token i in a configuration 7, 
there exists a configuration 7' in £-y such that t^'* ^ is in an edge coming from i. 

4.3.2 All tokens cire eventually correct 

This step needs no synchronism. Basically, the only needed property is that 
the random walk covers the system, ie that the random numbers generators are 

independent. 

Definition 9 We say that a token t is correct and we note correct{t) if 

Vfc e V,t.table[k] 7^ -L {k,t.table[k]) e E 

Lemma 1 ^1 = {7 e C/Vt e Token^,correct{t)] is an attractor ofC 

Proof Note inc{j) = {{t,i) € Token^ x V/{i 7^ t.emmiter A t.tahle[i] 7^ -L A 
(i,t.table\i]) ^ E) V (i = t. emitter A t.table[i] ^ i)}. A token is correct if and 
only if it does not appear in this set. We will show that eventually, inc{j) = 0. 
First, wc show that it is non-increasing, and then that, if it is greater than 0, 
then it eventually decreases. 

R3 and R4 do not affect Tokeriy, and in consequence inc{'y). 

R2 creates a new empty token t. This token is correct: for all j ^ t.emitter, 
t.table[j] = _L, so that {t,j) ^ inc{'y). Since other tokens are left unchanged, 
|mc(7)| does not increase. 

Consider the case when Rl is applied by node i to a set of tokens T. Then, 
Tokeny = Token^\T U {t'} with t' .table[i] = i, Vj i,3t € T,t' .table[f\ = 
t.table[j]. Thus, each inconsistency in t' comes from an inconsistency in a token 
in T: if {t',j) € mc(7), there exists (at least) a token t in T such that {t,j) G 
inc{j). Thus, mc(7') C inc{j). Now, if t € T is such that {t,i) e inc{'j), 
t' .table[i] = i, and {t',i) ^ mc(7'), so that inc{Y) C inc{'y) (note that by 
merging several tokens, some other inconsistencies may be corrected). 

Thus, inc{'^) does not increase. Now, consider a configuration 7 such that 
mc(7) ^ 0. Then, there exists {t,i) G inc{'-f). The hitting property entails that 
t will eventually hit i at configuration 7', and then inc{Y) C inc{'-f)\{{t,i)}. 

Thus, if inc{'y) ^ 0, it eventually decreases. Eventually, it reaches 0, and 
then, all tokens are correct. □ 

4.3.3 There is eventually a correct token (at leeist) in the system 

This step requires that if a single rule is enabled, it is eventually triggered. 

Lemma 2 A2 — {■^ &C, \Token-y\ >!} is an attractor ofC 
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Proof First , wc show that if there is a token in the system, it cannot disappear. 
Consider a configuration 7 such that Token^ ^ and 7 — >■ 7'. If 7 — 7' or 
ry ^R4(i) ^Yien Tokeny = Tokerij 7^ 0. If 7 ^-^2(1) Tokeny D Token-y ^ 
0. Last, if 7 ^^^(') 7', Tokerij' contains the token put at Rl.d, and is not 
empty. 

Suppose Tokerij = 0. First, we show that Wavey is eventually empty. 
Since Token^^ = 0, if R2 is triggered, a token is created and Token-^ is no 
longer empty. Aside R2, the only rules that can be triggered are R3 and R4. 
R4 does not modify Wave. Consider a message {{i,j),w) G Wave-y received at 
step 7 — 7'. Wave^ = Wave^\{{{k,i)w)} U {{{i,j),w')/w.table[j] = i h j & 
Ni,\/k,w' .table[k] = w.table[k],w' .table[i] = _L}. Thus, since a wave message v 
is sent to i only by v.table[i], i cannot receive any more message triggered by 
w. Thus, all sites can receive at most one wave message for each wave message 
present in Wave-y. Thus, eventually, Wave-y = 0. 

Now, the only rules that apply are R2 and R4. The continuing application 
of R4 leads to a timeout (or even all of them, leaving R2 the only activated 
rule) to reach 0, so that R2 is triggered, and a token created. □ 

Corollary 3 ^1 fl ^2 is an attractor ofC. 

4.3.4 No visited node can create a token — Reloading wave and 
synchronicity 

To verify this property, we need synchronicity assumptions: all nodes timers 
must be decremented at most once in the time it takes to a token to be re- 
ceived, treated, and sent again. We also need the cover property to be true, so 
independent random numbers generators on the nodes. 

We consider an arbitrary configuration 70 G Ai Ci A2. AH the following 
properties are about = (70,71, • • •)• We consider a configuration 7j in £,yg, 
and a token t in 7j. We consider the set Ai{t) of all nodes that have received 
the token t since 79: Ao(t) = 0. 

Lemma 3 t^'**\tab\A.i{t) represents a spanning tree of {Ai{t),E r\ Ai[tY). 

Proof Obviously, t'''^^\tab\Ag(t) = is a spanning tree of Af){t) — 0. 

The application of R3 and R4 entails no change on either A^ or the token 
messages. Thus, we only consider the application of Rl and R2 

Consider a step t^^^ — >■ t'^'^ ^ at which a node i receives the token t from j, 
and suppose that t'^'^\tab\Anf, is a spanning tree of Aiit). 

Then at the next step, t^'+'^\tab\j] = i, t^^'\tab\i] = i and t^''''\tab[k] = 
t^^\tab[k] for any other k in Ai{t). Since t^'^\tab\Ai{t) represents a spanning 
tree of {A,{t),V n A.it)"^), for any k ^ i,j, {k,t^^'\tab[k]) = {k,t^^^\tab[k]) is 
an edge of {Ai(t),V fl Ai{t)'^) and internaLtest will not remove k from this 
array. Since i has received the token from j, (j, t^^ ^ -tablj]) = {j, i) is an edge of 
{Ai+i{t),VnAi+i{tf). 
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If several tokens are pending reception by j, they may be merged: t'C^ — > ^ 
and 1^2^ — >■ t^'^'\ Then, since ti and ^2 are correct, t is also correct, and thus, 
t^^\tab is a spanning tree of {At(t),E n 

Now two case can occur: either i is in Ai(t), or not. In both cases, setting 
i as the root of the tree and the father of j, while leaving the remaining of the 
tree unchanged, gives a tree. 

Thus, t^''\tab\Ai{i') represents a spanning tree of {Ai{t), E Pi Ai{t)'^). □ 

A node may belong to several spanning tree, if it has been visited by several 
tokens. 

Lemma 4 The propagation of a reloading wave takes at most n tim,e units. 

Proof A time unit is the time taken by message sent to be received and treated. 
Now, the reloading wave is broadcast on a tree, of height at most J\f. Thus, this 
propagation takes at most J\f time units. □ 

Theorem 2 A node in Ai{t) cannot create a token. 

Proof Each time the token counter reaches — (n + 1), a wave is propagated. 

The two lemma above guarantee that this wave hits any node in in at 

most n time units. Now, since it is in Ai, either this node has already received 
a reloading wave message, or it has received a token since the last wave was 
propagated. In both cases, it has reset its timeout to Tm since the last wave 
initiation, ie during the last — {n + ^) time units. Thus, this timeout, at 
the initiation of the wave, is at least at n + 1. Then, when the wave reaches 
the node, its timeout is > 1, which makes it impossible for it to create a token 
between to successive waves, or between a token visit and the subsequent wave. 
Finally, no node in A^ can create a token. □ 
Note that the cover property ensures that eventually, Ai{t) = V whp, so 
that: 

Lemma 5 ^3 = {7 e n^2/Ui^«(0 = V} is a probabilistic attractor of 
AidA^. In A3, rule R2 can never be applied. 

4.3.5 There is eventually exactly one token 

The key assumption to verify this is that the meeting property of random walks 
hold: independent random numbers generators are needed. Also, when several 
tokens arc; headed to a same node, this node has to be able to detect it with 
probability > 0, which is the case if the local treatment time is not negligible 
before the transmission time, or if messages are buffered for some non-negligible 
time before being treated. 

Definition 10 A legitimate configuration is a configuration with a single token 

t, the table of which represents a spanning tree of the system,, and in which all 
nodes are hit by a reloading wave before their timers reach the value 0. 

£C = {7 e A3/\Token^\ = 1} 
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The following theorem proves that CC matches the specification of RandTokCirc. 

Theorem 3 A configuration 7 of CC is such that any execution (70 = 7, 71, 72, . . .) 
starting at 7 verifies 

• V/c, |Tofcen-y^ I — 1/ 

• VA:,Vi, 3/ > fc, \Tokenj^{i)\ = 1 whp. 

Proof The closure property of CC, that will be proved in the sequel, proves the 
first item. The second item comes from the fact that the successive positions 
of the random walk constitute a random walk, and thus verifies the hitting 
property. □ 

Lemma 6 CC is a probabilistic attractor of A3 

Proof Consider a step 7 — 7'. 

If 7 — 7' or 7 — >.^^ 7'^ Token ji — Token j. Now, according to theorem 
[2] R2 cannot be activated. 

If 7 7', Tokeny = Token.^\T U {t}, with |r| > 1. 

Thus, 1 < iToken-y'l < \Token-y \ (this is greater than 1 according to attractor 
A2), which ensures closure of CC. Now, if \Token^\ > 1, meeting property of 
random walks ensure that at some configuration 7' S E^, several tokens are 
headed toward a same node. Then, if the treatment time is not negligible 
before the transmission time, another token is received with probability > 
during the treatment of the first token, and those token are merged. Thus, whp, 
there is some configuration 7" £ Ej such that \Token-yii \ < \Tokenj\. 

Finally, eventually, a configuration S is reached with \Tokens\ = 1. □ 

From Lemmas [T] [2] and [6] 

Theorem 4 (Convergence and closure) The Algorithm, starting in an ar- 
bitrary configuration, converges to a configuration satisfying CC whp. 

5 The impact of mobility 

The token circulation algorithm presented above is self-stabilizing. Thus, from 
any arbitrary configuration occurring because of a topological change, the algo- 
rithm eventually resumes its normal behavior if no further topological change 
occurs. If the time between two topological reconfigurations is greater than K 
times the convergence time, then the system spends ^^j^ of the time in a correct 
configuration. 

The token circulation itself is robust to topological changes, as shown in 
corollary [T] However, we introduced mechanisms to ensure self-stabilization 
that can be affected by a topological change. Indeed, the reloading wave is 
based on a spanning tree computed in the course of the token circulation. This 
spanning tree can contain edges that have failed. In this case, the reloading 
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wave cannot be propagated to all nodes. The timer of a node not receiving the 
reloading wave will then expire, leading to an undue token creation. 

In this section, we study the probability that a topological change entails 
such an error. We also define a locally checkable criterion that ensures that no 
error occurs. 

The only non-local topological information used by the algorithm is the 
spanning tree contained in the token and in the reloading wave messages. Thus, 
a topological modification has an impact only if it makes those trees inconsistent 
with the topology. The tree used in reloading wave messages is a subtree of the 
tree in the token at the time when the reloading wave is launched (algorithm [2] 
rule Rl.c). 

Now, the tree in the token is updated each time the token hits a node (algo- 
rithm [2j rule Rl.a). After a topological change, a configuration is illegitimate 
if an edge that is in the token tree or in a reloading wave message is removed. 
This represents less than 2n — 2 edges in m. The walk of the token corrects the 
tree when the token hits the son of this edge in the tree. Thus, if no reloading 
wave is broadcast between the time at which the topological change occurs and 
the time at which the token hits this node, then the specification is met. 

Thus, a single link disconnection has a probability ^'^^ not to affect the 
algorithm. If the algorithm reaches an illegitimate configuration, it still has a 
probability P[Hji < T/2] to hit the son of the disconnected link before and 
correct the tree it contains before it launches a wave {Hji being the observed 
time, starting at node j to reach node j). Thus, after a topological change, with 
probability P[Hij < T/2] (see the computations of this quantity in the 

next section), the algorithm continuously meets the specification. 

An edge is in the tree if and only if it is the last link through which a node 
sent the token. If each node stores the link through which it sent the token 
last, the son in the tree of an link that has been disconnected can detect an 
illegitimate configuration. The configuration is illegitimate as long as the link 
through which a node sent the token is not present: from the link disconnection 
to the next visit of the token to the son of the link in the tree (see figure [2j i 
sends the token to j, that is its father until i receives the token again). 

Thus by replacing the statement in algorithm [2} 



Send Token to j chosen randomly in N{i) 



with: 



Choose j at random in N{i) 
Send Token to j 
fatheri < — j 



a wave propagation can be unsuccessful if and only if a node i is such that 
fatheri ^ Ni, which i can detect. 
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Figure 2: Local detection of an illegitimate state 
Thus, we have: 

Property 1 With probability P[Hij < T/2], after a link disconnection, 

the algorithm continuously respects the specification. If the system reaches an 
illegitimate configuration, a node in the system is aware of that. 

6 Timeout tuning 

To solve the communication deadlock problem, the algorithm uses a decentral- 
ized timeout procedure: each processor indistinctly can produce a new token. 
To guarantee the stabilization property, a new mechanism, the reloading wave, 
is introduced. The role of this wave, periodically triggered, is to prevent the 
creation of unnecessary tokens. 

Whatever the value proposed for T^, as soon as this value is greater than n, 
the algorithm works correctly. But if Tm is close to n, the reloading wave will 
be broadcasted too often, and if is too long, an absence of token will take 
a great amount of time before being corrected. We address in this section the 
problem to compute a good value for this timeout. 

No bound can be given on the time a random walk takes to reach a given 
node (only results on expected times are available). However, as time goes by, 
it becomes improbable that the walk has not reached a node. Wc first provide 
a probabilistic analysis of the waiting time. More precisely, we give a bound on 
the probability for a processor to wait for the token more than a certain amount 
of time. Then, we provide a criterion to decide a timeout value, based on the 
probability that the token is lost knowing that it has not been seen during a 
certain amount of time. This quantity depends on the probability that the token 
is lost during a transmission. 
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6.1 Waiting times 

The waiting time is the average time a node is waiting for the token. It can be 
defined as the return time ha (= g^^, see |Lov93j ). i.e. the expected number 
of steps for the token, starting at node i, to return to node i for the first time. 

It is interesting to measure the probabihty that a token has returned to a 
node after a given time. The probabihty that the token takes less than a given 
number of steps t to come back to the node i is defined by P[Hn < t] [Ha 
being the observed return time: Ha > t means that it has been more than t 
steps since node i has last seen the token). The following results give a more 
accurate and comprehensive insight in the time a node will wait for the token 
after having released it. In the sequel, we provide a bound on this value. 

For the sake of simplicity, we will first study P[Hii >t] = l — P[Hii < t + 1]. 

Notation Let <T[Hij] denote the standard deviation of Hij (the number of 
steps to reach a node j from i for the first time), and the variance of 

Hij. 

The Chebyshev's inequality states that for any a: 
Lemma 7 (Chebyshev's inequahty) 

P[H,, > hu + a.(j[Hu]] < ^ 

Thus we are led to compute the standard deviation of the hitting time. By 
definition, a[Hu] = VV[H~] with V[H,,] = E[{Hu - huf]. 

In the sequel, we present an algorithm to compute the variances of the return 
times on a graph, which is necessary to compute the Chebyshev bounds. 

In order to compute the variance of the return time, we need to know the 
variances of all hitting times. First, we state the following result: 

Lemma 8 (Variance of the number of steps to reach a node) 

V[H,,] + hl= ^ m{V[Hkj] + {hk, + l)') (1) 

Proof hij is the average length of the path a random walk starting from i takes 
until it reaches j. Thus, since the probability that the random walk reaches j 
is 1, the probability of an infinite random path not reaching j is 0, and 

with Ci_>j the set of all paths from i to j, p{c), the probability that a random 
walk follows the path c (p(c) = OPcfeCfc+i), and 1(c) the length of c. 
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p{c)l{c)'^ — hfj according to a well-known identity 

= X! PcociP{ciC2 ■ ■ ■){1{C1C2 . . .) + if ~ hi 



= E p^'' E p{cmcr+2iic) + i)-hi 

= p,k{V[Hk,]+hl + 2hk,+l)-hl 

kej\r(i) 

= Y P^k{V[Hu,] + {hk,+lf)~hl 
keN{i) 

□ 

The system ([!]) is linear, and depends on the hitting times. In |BS07| . we 
have proposed an efficient algorithm to compute the hitting times, with one 
matrix inversion. In order to solve the system and obtain the variances, we 
have to compute the inverse of a matrix. 

Let M{j) the matrix defined by: 

• Mu{j) = Pti = lii^l and j ] 

• M,,(j) = -lifi^j; 
. M,,(j)=Oifz^j ; 

• M,,{j) = f. 

Let v{j) a vector defined by Vi{j) = hl^ - Y.keM(€)Pik{hkj + 1)^ for i ^ j 
and Vj{j) = 0, thus Lemma [s] can be rewritten: 

MU)V[H.,] - 

M{j) being inversible, we can compute the variances by finding its inverse. 
From Lemma [7j we have 

Corollary 4 Given a time t: 



Given a probability e: 



P 



> I - £ (3) 
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Expression (|2| , provide a bound on the probability that the token has come 
back before a given time t. With ([s]), we can have a time after which we are 
sure at a given confidence level 1 — e that the token has come back. 




Figure 3; Graph example G 



To illustrate the meaning of the previous corollaries, consider the above 
graph. The return time is 5 for node 1. 

We use the corollary |4] to obtain a good value for timeout. The variance of 
Hii in the previous example is 51. Thus, for t — 50, the probability that the 
node 1 waits less than 50 steps to receive the token after having released it is 
more than 1 - ^^^^ = 1 - - 97, 5%. To be 99% sure that the token has 

t — /111 40 ^ 

returned to 1, we will have to wait hn + 10 x cr[i?ii] < 77 steps. 

6.2 On timeout for deadlock communication 

We take into account possible transient failures which may remove the token 
from the network. In this subsection, we give a mean for the nodes to detect at 
any confidence level the loss of the token. We provide a way to choose the best 
timeout value. 

The longer a node has been waiting for the token, the more likely the token 
has disappeared. The suspicion that the token is lost increases with the time 
elapsed since it has seen the token for the last time. A node will have to check 
if the token has disappeared and then create a new token if necessary. 

We model the possibility that the token disappears by introducing a proba- 
bility p that the token disappears at each step: if the token exists at time t, at 
time t -\- 1, the probability that it has disappeared is p and the probability that 
it still exists is I — p. 

6.2.1 Measuring the probability that the token is lost 

We denote Lt the event "at time t, the token is lost". We know, when the token 
cannot be lost, the probability that it comes back before a given time knowing 
that it still exists. We now want to compute the probability P[Lt\Hii > t] that 
the token is lost knowing that a node has not seen it in a given time. 
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Theorem 5 The probability that the token is lost, knowing that node i has not 

seen it in the last t steps, is: 

Pm\Hu >t\>l- 2{l-py+H^+pt^2^+^ 

where V[Hii] is the variance of Ha, the number of steps before returning to i 
for the first time. 

Proof Using the Bayes theorem, we obtain: 

^ ' ' P[Hu > t] 

P[-.Lt n {Hu > t}]P[-^Lt] 

P[Hu > t]P[^Lt] 
P[Hu > t\^Lt]P[-.Lt] 
P[Hu > t] 

According to the previous section, P[Hii > t\-^Lt\ < ^^^"^ ■ 
min{deg}« ^ bound On the probabihty that the token goes forth and back 
between two nodes during t steps. 

We also have: P[Hu > t] > (1 (i _ p)/=_i_^ = 

P) mm{deg}* +P l-(l-p)i - P) 2* +P l-(l-p)l " 

Thus, 



P[Lt\Hu > t] > 1 • 



> 1 • 



> 1 • 



> 1 



> 1 - 



(1 - p)' J, - (1 - ^ + p(l - (1 - p)* J,) 



(l-p)*+i^-(l-p)*+ip^+p 

V[Hii]2*+\l - pY+\1 - 

(1 _p)t+lj2 _^p(22t+l 

□ 
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Figure 4: Probability that the token is lost on graph G according the elapsed 
time on a node and a probability p that the token is lost during a step 



y[_ff,,]2*+i(i-p)*+i(i- 



The figure above represents the graph e = f{t) - (i-p)t+H'^+pt^2t+^ ' 

with y [iJii] = 51, and p = 0.1, 0.2, . . . , 1. To be 95% sure that the token is lost, 
we look for the intersection of the curve with 1 — e = 0.95. If p = 0.1, we can 
see that we will have to wait for 23 steps, if p = 0.5, 38 steps, and if p = 0.9, 75 
steps. 



6.2.2 Choosing timeout values 

Choosing t so that 

V[g,,]2*+i(l-p)*+Hl- V) 

(1 -p)* + li2 +pi22t+l 

provides a time after which, if a node has not seen the token, the probability 
that it has disappeared is greater than 1 — gr. 

Theorem 6 (Timeout value) Choosing a timeout greater than 

log^+log(i^)-log£ + 2 
- log(l - p) 

ensures that the token is lost with probability 1 — e. 
Proof 
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y[HH]2*+i(i-p)*+Hi- V 



(1 -p)*+li2 +pf22t+l 



2 ' 



(1 _p)*+lt2 +^^22*+! ^ V[Hu]{l - V) 



2'+i(l -p) 



t+i 



pt^ ^VlHuKl-'-^) 



> 



2*+i - £ 

^ F[//,,](l-i=f) 



<^=> > 

(l_p)t+i- ^ 

(1 -p)* ~ 2sp{l-p) 
«21„g,-Uog(l-p)>log(«±ii) 

When focusing only on f > ha: logt > log ha and if t is such that 
21og/iii-ilog(l-p) >logC 
then the probabiUty that the token is lost is less than 1 — e. 

□ 

In the above example, with p = 0.1 and e = 1%, we have to set the timeout 
to 33. With e = 10%, the timeout is to be set at 23. 



7 Conclusion 

We have proposed a self-stabilizing token circulation algorithm with no assump- 
tion on the topology of the distributed system. This algorithms can manage all 
events related to mobility, most of them without even requiring any conver- 
gence. The (average) convergence time is computed, and the trade-off between 
the number of messages and the convergence time is explained. 

We now plan on working on the scalability of such solutions, with a quanti- 
tative assessment of the dynamicity of the considered systems. 
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